Twisted Tree Heartrot Hill Revisited

Recently, while re-examining PAGES2K, the current paleoclimate darling, I noticed that PAGES2K(2019) reverted to a variation of the Twisted Tree Heartrot Hill (Yukon) [TTHH] tree ring chronology that we had already criticized in 2003 as being obsolete when used by Mann et al 1998.  PAGES2K was supposed to be an improvement on Mann et al 1998 data, but, in many ways, it’s even worse. So it’s It was very strange to observe the 2019 re-cycling of a TTHH version, previously criticized in 2003 as being already obsolete in 1998. 

MM2003

In McIntyre and McKitrick (2003), we had observed that MBH98 had used an obsolete version of the Twisted Tree Heartrot Hill (Yukon) [TTHH] tree ring chronology, for which measurement data ended in 1975, as compared the chronology ending in 1992 available at the NOAA archive, as shown in excerpt below. (I checked with NOAA and verified that the updated chronology was available at NOAA prior to submission of MBH98.) 

The TTHH chronology declined precipitously in the late 1970s and 1980s, reaching its lowest value in the entire record in 1991. However, the MBH version ended in 1980 (extrapolating the 1975 value for the final 5 years.) Below is comparison from our 2003 article.

Mann et al 2003 Response

In our 2003 replication, we used the NOAA version of the TTHH chronology rather than the obsolete MBH version. In their contemporary (November 2003) response to our first article, Mann et al objected vehemently claiming that we had wrongly substituted a “shorter version” of the TTHH chronology for the “longer” version used in MBH98. (The so-called “shorter” version used a larger dataset but began when 5 cores were available.)

Because the MBH98 proxy reconstruction ended in 1980, the difference between the two versions wasn’t an important issue in the main narrative of MBH hockey stick controversies, but it does become relevant for reconstructions ending in 2000 (such as PAGES2K).

PAGES2K (2019) Version

PAGES2K, including its 2019 version, reverted to the TTHH data version already obsolete in Mann et al 1998 – the data ending in 1975, not 1992.  The figure below compares the TTHH version in PAGES2K (2019) – on -right – to the TTHH versions discussed above. The PAGES2K version uses the same measurement data (ending in 1975) as the MBH98 version.  The PAGES2K chronology is very similar to the MBH98 version in the period of overlap (1550-1975) but is not exactly the same. Notice that the PAGES2K version (like MBH98) avoids the post-1975 data with the severe “decline”. 

The precise provenance of PAGES2K chronology versions is not reported and figuring them out by reverse engineering is a herculean effort (e.g. Soderqvist’s work on PAGES2K Asian chronologies.)  Amusingly, the PAGES2K version begins a little later (1550) than the version that Mann had criticized for being “shorter”.

 

Measurement Data

Although Jacoby and D’Arrigo’s contemporary NOAA archive included the TTHH chronology up to 1992 (with its decline), they never archived the measurement data corresponding to the 1992 chronology.  Many years later (2014), as Jacoby was on his death bed, they filed a large archive of measurement data with NOAA, including data for a Yukon regional chronology (cana326), a subset of which was the 1975 TTHH measurement data. This archive included the TTHH update, but did not include a concordance identifying which identifiers belonged to the TTHH update and which identifiers belonged to other Yukon locations.  

By coincidence, Tom Melvin, Briffa’s associate at the University of East Anglia, had used a TTHH measurement data version (74 cores) as a benchmark for testing “signal free” methodology in 2010 and this measurement data proved to be available in an archive identified by Hampus Soderqvist in his investigations of signal-free methodology. It contained the 1975 measurement data (34 cores), 25 cores from 1987-1992 and 15 cores from 1999 sampling. 

As an exercised, I calculated a chronology using conventional methodology from the subset of cores collected in 1992 or earlier – shown below.  It closely matches the chronology archived in NOAA in the mid-1990s.  

TTHH 1999 Update

The TTHH measurement data was updated a second time in 1999. Its results were published in a 2004 article by D’Arrigo et al entitled “Thresholds for warming-induced growth decline at elevational tree line in the Yukon Territory, Canada” (link) in which they broached the problem of the “decline” in high latitude tree ring widths in late 20th century, despite observed warming in the Arctic.  Climate Audit readers will recall D’Arrigo’s “explanation” of the “divergence problem” to the NAS panel in 2006, when she explained that you “have to pick cherries if you want to make cherry pie”.   

The type case in D’Arrigo et al 2004 was TTHH as shown below. 

D’Arrigo et al never archived the measurement data or chronology for their 2004 article on the divergence problem.  As another exercise, I calculated a chronology for the Melvin data including cores from the 1999 update using a conventional methodology (dplR ModNegExp): it closely replicated the D’Arrigo diagram from 1575 on, but not in the earliest portion (when there are fewer than 5 cores anyway.) 

Melvin Signal-Free Version

As a final exercise, I looked at Melvin’s “signal-free” methodology on the resulting tree ring chronology. (On previous occasions, we’ve discussed the perverse results of this methodology on multiple PAGES2K Asian tree ring chronologies, as articulated by Soderqvist.)  In this case, the signal-free artifact at the end of the series increases closing values by about 20% – much less dramatic than the corresponding artifact for paki033 but an artifact nonetheless.  

Conclusion

In any event, PAGES2K did not use the Melvin signal-free version – which went to 1999 and incorporated a decline after 1975. As noted above, PAGES2K reverted to the measurement data version already obsolete in Mann et al 1998, but in a novel version. At present time, the provenance and exact methodology of the PAGES2K calculation is unknown.  As readers are aware, it took a heroic effort by Soderqvist to deduce that the methodology and provenance used in multiple PAGES2K Asian tree ring chronologies (a particular LDEO variation of “signal-free” methodology). My guess is that the TTHH chronology used in PAGES2K was also calculated by LDEO (Cook et al) using some variation of Melvin iteration: the closing uptrend in the difference is characteristic.  

 

D’arrigo et al 2006: NWNA Alaska

Today’s article is about one of the D’Arrigo et al 2006 datasets.

D’Arrigo et al 2006, then under submission, had been cited in drafts of the IPCC Fourth Assessment Report. I had been accepted as an IPCC reviewer and, as an IPCC reviewer, I asked IPCC to make the data available to me or to ask the lead author to make the data available. That prompted a vehement refusal that I documented in March 2007 (link).  Readers unfamiliar with the severity of data obfuscation by climate science community should read that exchange.  (Some further light on the campaign emerged later in the Climategate emails.

D’Arrigo et al 2006 calculated more than a dozen new regional chronologies, but refused to archive or provide the digital chronologies until April 2012, more than six years later (by which time the paleo field purported to have “moved on”.  Also, in April 2012, more than six years later, D’Arrigo et al provided information (somewhat sketchy) on which sites had been used in the various reconstructions, but measurement data for many of the sites was unavailable, including (and especially) the sites that had been sampled by D’Arrigo, Jacoby and their associates.  Much of this data was archived in April 2014, a few months before Jacoby’s death. But even this archive was incomplete.

By then, D’Arrigo et al 2006 was well in the rear view mirror of the paleo community and there has been little, if any, commentary on the relationship of the belated and long delayed 2014 data archive to the 2006 article.

In several recent posts, I’ve discussed components of D’Arrigo’s Northwest Alaska (NWNA) regional chronology, which, prior to 2012, had only been available in the muddy form shown below.

The NWNA series goes from AD1297 to AD2000 and closes on a high note – as shown more clearly in the top panel below, which re-plots the post-1800 period of the NWNA chronology (RCS version; STD version is very similar.)  Also shown in this figure (bottom panel) is the post-1800 period of the chronology  (ModNegExp ) for the Dalton Highway (ak104) site, the only component of the NWNA composite with values in the 1992-2000 period (shown to right of red dashed line.)

Look at the difference right of the dashed line at AD1990.  In the underlying Dalton Highway data, the series ends at almost exactly the long-term average, whereas the same data incorporated into D’Arrigo’s NWNA regional composite closes at record or near-record highs for the post-1800 period.

If the 1992-2000 Dalton Highway data doesn’t show record highs for the site chronology, then it is implausible to claim that it shows record highs for the regional chronology.  So what’s going on here?

My guess is that the regional chronology has mixed sites with different average widths and that their rudimentary statistical technique didn’t accommodate those differences.  If so, this would be the same sort of error that we saw previously with Marcott et al 2013, in which there was a huge 20th jump without any increase in component series (simply by a low value series ending earlier.)  Needless to say, these errors always go in a hockey stick direction.

 

Sheenjek, Alaska: A Jacoby-MBH Series

MBH98 used three Jacoby tree ring chronologies from Alaska: Four Twelve (ak031) – discussed here, Arrigetch (ak032) and Sheenjek (ak033). Sheenjek will be discussed in this article.

In our compilation of MBH98 in 2003, we observed that the Sheenjek chronology archived at NOAA Paleo was not the same as the “grey” version used in MBH98.   While we used the MBH98 version to benchmark our emulation of the MBH98 algorithm, we used the version archived at NOAA in our sensitivity analysis, both in our 2003 article and in our early 2004 submission to Nature.  In his reply to our submission, Mann vehemently protested that the “introduc[tion of] an extended version of another Northern Treeline series not available prior to AD 1500 at the time of MBH98” “introduce[d] problems into the important Northern Treeline dataset used by MBH98”:

Finally, MM04 introduce problems into the important Northern Treeline dataset used by MBH98. Aside from incorrectly substituting shorter versions of the “Kuujuag” and TTHH Northern Treeline series for those used by MBH98, and introducing an extended version of another Northern Treeline series not available prior to AD 1500 at the time of MBH98, they censored from the analysis the only Northern Treeline series in the MBH98 network available over the AD 1400-1500 interval, on the technicality that it begins only in AD 1404 (MBH98 accommodated this detail by setting the values for AD 1400-1404 equal)

The other “Northern Treeline series” referred to here was Sheenjek chronology ak033.crn.  I checked Mann’s assertion alleging that the data was “not available prior to AD1500 at the time of MBH98”. This was contradicted by NOAA, who confirmed that the chronology that we had used had been available since the early 1990s.

In the figure below, I’ve compared three Sheenjek chronology versions:

  • the MBH98 version from 1580-1979 (plus 1980 infill);
  • the ModNegExp chronology (dplR) calculated from measurement data (ak033.rwl), which, in this case, has been available since the 1990s. It covers period 1296-1979.
  • the archived chronology at NOAA (ak033.crn). Also covering the period 1296-1979.

The issues relating to Sheenjek are different than observed at Four Twelve.

  • The MBH98 and the chronology (rwl) freshly calculated from measurement data using ModNegExp option (emulating contemporary Jacoby technique) are very, very similar for their period of overlap (1580-1979).  Neither show elevated 20th century values or a closing uptick. If anything, a modest decline in late 20th century.
  • however, the MBH98 version excludes all values prior to AD1580.  There is no good reason for this exclusion. There are 28 cores in the ak033.rwl in 1579, far above usual minimums.  In the 15th century, there are more cores for Sheenjek than for the Gaspe series which was used by MBH98 in its AD1400 network, even when it only had one core. (And even no cores for the first five years.)
  • the Sheenjek chronology archived at NOAA (ak033.crn) was clearly derived from the ak033.rwl dataset, as the series in middle and bottom panels are highly correlated. However, from its appearance, it looks like the archived Sheenjek chronology was calculated with very flexible splines (rather than “stiff” ModNegExp) and that this has attenuated the “low frequency” variability observed in the middle panel using ModNegExp option.
  • We used the ak031.crn version in our sensitivity study. If the same exercise was repeated using the middle panel version, it would yield relatively high early 15th century results.

It is not presently known who chopped off Sheenjek values prior to AD1580 in the MBH98 version. Or why.

All cores in the Sheenjek dataset were included in D’Arrigo et al 2006 NWNA Composite.

 

Four Twelve, Alaska: A Jacoby Series

Four Twelve (Alaska) was one of the 11 Jacoby and D’Arrigo series used in MBH98. In our original 2003 article, we observed that the MBH98 version of this chronology differed substantially from the chronology officially archived at NOAA, and, in our sensitivity study, used the archived version (after using the MBH version for benchmarking.)  Among other things, Mann objected vehemently to the very idea of the sensitivity analysis that we had carried out:

An audit involves a careful examination, using the same data and following the exact procedures used in the report or study being audited.  McIntyre and McKitrick (“MM03”) have done no such thing, having used neither the data nor the procedures of MBH98. Their effort has no bearing on the validity of the conclusions reported in MBH98, and is no way a “correction” of that study as they claim. On the contrary, their analysis seriously misrepresents MBH98. 

However, the different Jacoby versions were a secondary issue in the contemporary debate and the inconsistency between MBH98 and Jacoby versions wasn’t pursued further at the time.  

Analysis was further frustrated by peculiar inconsistencies in the Jacoby archive itself.  For Four Twelve (and several other sites), the archived chronology ended in 1990, whereas archived measurement data ended in 1977.  The period of the archived measurement data was consistent with the period of the MBH98 version of Four Twelve (treeline1.dat), but there was no measurement archive corresponding to the archived chronology.  It was the sort of dog’s breakfast that was all too typical.  Jacoby’s death-bed archive once again provides an answer (as discussed below).

First, here is a comparison of the MBH98 chronology (treeline1.dat) versus the chronology calculated from the ak031.rwl measurement data (covering exactly the same period) using Bunn’s ModNegExp option (which corresponds to contemporary Jacoby methodology).  The two chronologies are highly correlated and cover the same period, but the elevated mid-20th century values of the MBH98 version were not replicated.   I presume that the MBH98 version came from Jacoby and/or D’Arrigo and that this version was used in Jacoby and D’Arrigo 1989 as well.  Mann’s composite of Jacoby and D’Arrigo treeline series was also used for the MBH99 bodge of the North American PC1 (to shave down the blade to “get” a passing RE – as Jean S showed long ago).

One of the “new” measurement datasets in Jacoby’s death-bed 2014 archive was ak109.rwl, described as a Four Twelve Update. It covered exactly the same period as the ancient ak031.crn chronology (1524-1990). Application of ModNegExp chronology algorithm yielded an almost exact replication of the archived chronology, as shown below.  This confirms that (1) that the ak031.crn chronology was derived from ak109.rwl measurement data – an inconsistency unique to the Jacoby treeline data; (2) the ModNegExp algorithm is a reliable equivalent to the methodology used by Jacoby for the chronologies archived in the early 1990s.

Inconsistent Information on Updates

In the early 1990s, Jacoby updated multiple sites in the northern treeline network published in 1989.  In this article, I’ve commented on the Four Twelve Update in 1990, for which the chronology was archived in the early 1990s, but the measurement data not until 2014, more than 20 years later and more than 30 years since the original collection.

A much more troubling example (cited in early Climate Audit articles) was the corresponding update for the Gaspe, also carried out in the early 1990s, where the measurement data yielded a totally result than the big bladed hockey stick used in MBH98, but was withheld by Jacoby et al for a further 20+ years until a few months before Jacoby’s death.   

D’Arrigo et al 2006 NWNA Composite

Four Twelve (Alaska) is one of four sites that contribute to the D’Arrigo et al 2006 Northwest Alaska (NWNA) Composite, illustrated below. However, the NWNA Composite goes up in its closing period, as opposed to the closing decline of both Four Twelve versions.  Curiously, the NWNA Composite only uses the second (1990) tranche of Four Twelve measurement data, excluding the original (1970s) tranche, whereas for nearby Arrigetch, it incorporated both tranches. 

An odd inconsistency. I’ll look at the D’Arrigo NWNA Composite in due course. 

 

 

 

Discovery of Data for One of the “Other 26” Jacoby Series

We’ve long discussed the bias imparted by ex post selection of data depending on whether it went up in the 20th century.  Likening such after-the-fact selection to a drug study carried out only on survivors.

The Jacoby and d’Arrigo 1989 network was a classic example: the original article reported that they had sampled 36 northern treeline sites, from which they selected 10 with the “best record…of temperature-influenced tree growth”, to which they added a chronology of Gaspe cedars that was far south of the northern treeline at low altitudes. 

In 2004 and 2005, I made a determined effort (link) to obtain the measurement data for the 26 sites that weren’t included in the final calculation.  Jacoby refused. I tried over and over to get this data, but was never successful.

Gordon Jacoby died in October 2014.  In June 2014, a few months prior to his death, the Lamont Doherty Earth Observatory unit of Columbia University (Jacoby’s employer) archived a large collection of tree ring data collected by Jacoby and associates (link).  By then, it was 25 years since publication of Jacoby and D’Arrigo 1989 and 8 years since publication of D’Arrigo et al 2006.

By then, the paleoclimate community had “moved on” to the seeming novelties of PAGES2K. A few Jacoby and d’Arrigo series re-appeared in PAGES2K. I wrote a couple of articles on these new Jacoby and d’Arrigo avatars: on their Central Northwest Territories (Canada) series in January 2016 here; and on their Gulf of Alaska series in February 2016 here and here. But the articles attracted little interest. Jacoby and D’Arrigo had successfully stonewalled availability of data until no one was interested any more.  Not even me.

However, while recently refreshing myself on ancient MBH98 issues, I discovered something interesting: buried in the dozens of measurement data sets in the belated 2014 archive was one of the datasets that Jacoby had withheld back in 2004. (Thus far, I’ve only found one, but there may be others.)  It was a northwest Alaska dataset collected in 1979 – .   What did the withheld data show? Despite the passage of time, I was interested.

Long-time readers will undoubtedly recall Jacoby’s classic data refusal:

We strive to develop and use the best data possible. The criteria are good common low and high-frequency variation, absence of evidence of disturbance (either observed at the site or in the data), and correspondence or correlation with local or regional temperature. If a chronology does not satisfy these criteria, we do not use it. The quality can be evaluated at various steps in the development process. As we are mission oriented, we do not waste time on further analyses if it is apparent that the resulting chronology would be of inferior quality.

If we get a good climatic story from a chronology, we write a paper using it. That is our funded mission. It does not make sense to expend efforts on marginal or poor data and it is a waste of funding agency and taxpayer dollars. The rejected data are set aside and not archived.

As we progress through the years from one computer medium to another, the unused data may be neglected. Some [researchers] feel that if you gather enough data and n approaches infinity, all noise will cancel out and a true signal will come through. That is not true. I maintain that one should not add data without signal. It only increases error bars and obscures signal.

As an ex- marine I refer to the concept of a few good men.

A lesser amount of good data is better without a copious amount of poor data stirred in. Those who feel that somewhere we have the dead sea scrolls or an apocrypha of good dendroclimatic data that they can discover are doomed to disappointment. There is none. Fifteen years is not a delay. It is a time for poorer quality data to be neglected and not archived. Fortunately our improved skills and experience have brought us to a better recent record than the 10 out of 36. I firmly believe we serve funding agencies and taxpayers better by concentrating on analyses and archiving of good data rather than preservation of poor data.

They may also recall Rosanne D’Arrigo’s remarkable 2006 presentation to a dumbfounded NAS Panel, to whom she explained that you had to pick cherries if you want to make cherry pie, as I reported at the time (link):

D’Arrigo put up a slide about “cherry picking” and then she explained to the panel that that’s what you have to do if you want to make cherry pie. The panel may have been already reeling from the back-pedalling by Alley and Schrag, but I suspect that their jaws had to be re-lifted after this. Hey, it’s old news at climateaudit, but the panel is not so wise in the ways of the Hockey Team. D’Arrigo did not mention to the panel that she, like Mann, was not a statistician, but I think that they already guessed.

D’Arrigo et al (2006) was relied upon by both NAS Panel and IPCC AR4, but, once again, D’Arrigo refused to provide measurement data – even when politely asked by Gerry North, chair of the NAS Panel.

Sukak Peak (ak106)

The measurement data for  ak106.rwl ( link), Sukak Peak, Alaska, showed that it had been sampled in 1979.  It was at the same latitude (67-68N) in NW Alaska as the three Alaska sites used in Jacoby and D’Arrigo 1989 (Four Twelve, Arrigetch, Sheenjek) and was located about halfway between Arrigetch (151W) and Sheenjek (144W).

It seems virtually certain that this was one of the “other 26” sites that Jacoby had sampled prior to Jacoby and D’Arrigo 1989, but had excluded from the study and then vehemently refused.

Here is a chronology for Sukak Peak (ak106) using Andy Bunn’s dplR (ModNegExp option to emulate Jacoby methodology), produced using Bunn’s dplR plot function: chronology in solid line (left axis scale), core counts in light grey (right axis scale):

First, the chronology (dark line) had elevated values in the AD1100s; its 20th century values were unexceptional and declined through the 20th century, with closing values indistinguishable from long-term average.  It definitely doesn’t tell the “climatic story” that Jacoby was trying to tell.

Second, and this is a surprise (or maybe not), the core counts – shown in solid light grey in the above graphic – show that Sukak Peak had 10 cores by AD1311 and was at 5 cores by AD1104. In contrast, the entire Jacoby network incorporated into MBH98 had only one core (from Gaspe) prior to AD1428 and none prior to AD1404,In other words, although this had been withheld by Jacoby, replication at this site was better than at any other Jacoby and D’Arrigo site used in MBH98. It was not “lower quality” in any objective sense.

Although Sukak Peak data was still unarchived and unpublished in 2006, it was used in the D’Arrigo et al 2006 NW Alaska Composite dataset, the chronology of which reported high late 20th century values – the opposite to what is displayed in this component. The NWNA Alaska Composite also included subsets of Four Twelve, Arrigetch, Sheenjek (none of which show high late 20th century values) and a later dataset from Dalton Highway which I’m presently unfamiliar with.  I will take a look at this dataset in a follow up post.

In closing, I had long presumed that data for the “other 26” Jacoby and D’Arrigo northern treeline sites had disappeared forever.  But it turns out that data for one of the sites was archived in 2014 – 35 years after collection in 1979, 25 years after publication of Jacoby and D’Arrigo 1989 and a mere 16 years after publication of MBH98.

Plus another 9 years before anyone noticed that Jacoby’s death-bed archive contained one of the long withheld “other 26” sites.  A pleasant surprise nonetheless.  But definitely not a surprise to discover that the withheld data did not have a hockey stick shape.

 

MBH98 Weights – an Update

In numerous ancient Climate Audit posts, I observed that all MBH98 operations were linear and that the step reconstructions were therefore linear combinations of proxies, the coefficients of which could be calculated directly from the matrix algebra (described in a series of articles.)   Soderqvist’s identification of the actual proxies enables calculation of the AD1400 weights by regression of the two “glimpses” of the AD1400 step (1400-1449 in the spliced reconstruction and 1902-1980 in the Dirty Laundry data) against the proxy network. The regression information is shown in an Appendix at end of this post. 

The figure below shows the weights for (scaled) proxies as follows: left – weights from my previous (ancient) calculations from “first principles”; right – from regression of reconstruction “glimpses” against Soderqvist identification network.

I haven’t yet tried to retrace my linear algebra using the new identification. The linear algebra used in the diagram at left also reconciles to five nines to the Wahl-Ammann calculation. So it can safely be construed as the weights for the AD1400 network as listed in the Nature SI, but not the actual MBH98 network, the weights of which are shown on the right.  

Within the overall similarity, there are some interesting differences in weights arising from the use of four lower order NOAMER (pseudo-) PCs rather than four tree ring series from Morocco and France.  The problematic Gaspe series (what Mark Steyn referred to in his deposition as the “lone pine”) receives nearly double the weighting in the MBH98 data as actually used, as opposed to the incorrect listing at Nature.  Also, the NOAMER PC6 is almost as heavily weighted as the notorious Mannian PC1.  It will be interesting to see how heavily the Graybill stripbark bristlecones and other data that Mann had analysed in his CENSORED directory feature in this other heavily weighted PC. My guess is that combination of principal components and inverse regression will show the heavy weighting of stripbark bristlecones and downweighting of other data that we pointed out almost 20 years ago. 

The contribution of North American individual species to the MBH AD1400 reconstruction can be calculated from the eigenvectors.  In the Mannian PC1, nearly all of the sites (and thus species) have positive coefficients, though, as discussed many years ago, the stripbark species (bristlecones PILO and PIAR; foxtail PIBS) are the most heavily weighted.  When six PCs are used in the MBH98 algorithm, Engelmann spruce (PSME) are flipped to a negative orientation. 

Appendix

Below is the output from a simple regression of MBH98 AD1400 “glimpses” (AD1400-1449 from the splice and AD1902-1980 from the Dirty Laundry data) against Soderqvist’s identification of the actual network. The R^2 of the reverse engineering is 0.9999 with significance less than 2e-16 for all but one proxy (seprecip-nc).  A small bit of untidiness with seprecip-nc, but de minimis.

Also, for reference, here is a diagram of AD1400 proxy weights from late 2007 (link).  I discussed these weights in a couple of subsequent presentations. 

bigpro23.gif

 

 

Mann’s Other Nature Trick

In today’s post, I will report on some excellent work on MBH98 by Hampus Soderqvist, who discovered an important but previously unknown Mike’s Nature Trick: Mann’s list of proxies  for AD1400 and other early steps was partly incorrect (Nature link now dead – but see  NOAA or here).  Mann’s AD1400 list included four series that were not actually used (two French tree ring series and two Moroccan tree ring series), while it omitted four series that were actually used.  This also applied to his AD1450 and AD1500 steps.  Mann also used an AD1650 step that was not reported.

Soderqvist’s discovery has an important application.

The famous MBH98 reconstruction was a splice of 11 different stepwise reconstructions with steps ranging from AD1400 to AD1820. The proxy network in the AD1400 step (after principal components) consisted 22 series, increasing to 112 series (after principal components) in the AD1820 step.  Mann reported several statistics for the individual steps, but, as discussed over and over, withheld the important verification r2 statistic.  By withholding the results of the individual steps, Mann made it impossible for anyone to carry out routine statistical tests on his famous reconstruction.

However, by reverse engineering of the actual content of each network, Soderqvist was also able to calculate each step of the reconstruction – exactly matching each subset in the spliced reconstruction.  Soderqvist placed his results online at his github site a couple of days ago and I’ve collated the results and placed them online here as well.  Thus, after almost 25 years, the results of the individual MBH98 steps are finally available.

Remarkably, Soderqvist’s discovery of the actual composition of the AD1400 (and other early networks) sheds new light on the controversy about principal components that animated Mann’s earliest realclimate articles – on December 4, 2004 as realclimate was unveiled. Both articles were attacks on us (McIntyre and McKitrick) while our GRL submission was under review and while Mann was seeking to block publication. Soderqvist’s work shows that some of Mann’s most vehement claims were untrue, but, oddly, untrue in a way that was arguably unhelpful to the argument that he was trying to make. It’s quite weird.

Soderqvist is a Swedish engineer, who, as @detgodehab, discovered a remarkable and fatal flaw in the “signal-free” tree ring methodology used in PAGES2K (see X here).  Soderqvist had figured this out a couple of years ago. But I was unaware of this until a few days ago when Soderqvist mentioned it in comments on a recent blog article on MBH98 residuals.

The Stepwise Reconstructions

Mann et al (1998) reported that the reconstruction consisted of 11 steps and, in the original SI (current link), reported the number of proxies (some of which were principal component series) for each step – 112 in the AD1820 network and 22 in the AD1400 network.  As we later observed, the table of verification statistics did not include Mann’s verification r2 results. Verification r2 is one of the most commonly used statistics and is particularly valuable as a check against overfitting in the calibration period.

Although Mann claimed statistical “skill” for each of the eleven steps, he did not archive results of the 11 individual step reconstructions. In 2003, we sought these results, ultimately filing a formal complaint with Nature. But, to its continuing discredit, Nature supported Mann’s withholding of these results.  Despite multiple investigations and litigations, Mann has managed to withhold these results for over 25 years.

Nor did Mann’s original SI list the proxies used in each step.  In April 2003, I asked Mann for the location of the FTP site containing the data used in MBH98.  Mann replied that he had forgotten the location but his associate Scott Rutherford would respond. Subsequently, Rutherford directed to me to a location on Mann’s FTP site which contained a collation of 112 proxies (datestamped July 2002), of which many were principal component series of various tree ring networks. It’s a long story that I’ve told many times. In the 1400-1449 period of Rutherford’s collation, there were 22 “proxies” including two North American PCs.

In October 2003 (after asking Mann to confirm that the data provided by Rutherford was the data actually used in MBH98), we published our first criticism of MBH98. Mann said that we had used the “wrong” data and should have asked for the right data. Mann also provided David Appell with a link to a previously unreported directory at Mann’s FTP site, most of which was identical to the directories in the Climategate zipfile that Soderqvist subsequently used. This FTP location was dead from at least 2005 on and there is no record of it in the Wayback Machine. (Its robots.txt file appears to have prevented indexing.)  At the time, Mann also said that MBH98 had used 159 series, not 112 series.  We asked Mann to identify the 159 series. Mann refused. (There was much other controversy).

Ultimately, we filed a Materials Complaint with Nature asking them, inter alia, to (1) require Mann to identify the 159 series actually used in MBH98  and (2) provide the results of the individual steps (described as “experiments” in the SI).  Nature, to its shame, refused to require Mann to provide the results of the individual steps (which remain withheld to this day), but did require him to provide a list of the proxies used in each step.   In the AD1400 network, it included the four French and Moroccan tree ring series and two North American PCs. This list was published in July 2004 and has been relied on in subsequent replication efforts.

Although Mann refused to provide results of individual steps, the archived reconstruction (link) is a splice of the 11 steps, using the results of the latest step where available. Its values between 1400 and 1449 thus provides a 50-year glimpse of the AD1400 reconstruction. This is a long enough period to test whether any proposed replication is exact. (I recently noticed that the Dirty Laundry data in the Climategate archive provides a second glimpse of values between 1902 and 1980 for the AD1400 and AD1600 networks.)

At different times, McIntyre-McKitrick, Wahl-Ammann and Climate Audit readers Jean S and UC tried to exactly replicate the individual steps in the spliced MBH98 results, but none of us succeeded. When Wahl-Ammann published their code, I was able to reconcile their results to our results to five nines accuracy within a few days of their code release (e.g. link, link). It ought to have been possible to exactly reconcile to MBH98 results, but none of us could do so. The figure below (from May 2005) shows the difference between the Wahl-Ammann version and MBH98 version. At times, the differences are up to 1 sigma.  To be clear, the shape of the replication – given MBH data and methods – was close to MBH98 values, but there was no valid reason why it couldn’t be replicated exactly and, given the effort to get to this point, each of us wanted to finish the puzzle.

In 2006, Wegman wryly observed  that, rather than replicating Mann and disproving us, Wahl and Ammann had reproduced our calculations.

Around 2007, Jean S and UC both tried unsuccessfully to replicate the MBH98 steps. I had posted up scripts in R in 2003 and 2005. UC posted up a clean script in Matlab for MBH replication. Eventually, Jean S speculated that Mann’s list of proxies must be incorrect, but we all eventually gave up.

A few years ago, Soderqvist noticed UC’s script for MBH98 and began reverse engineering experiments in which he augmented the AD1400 network with other candidate proxies available in the Climategate documents (mbh-osborn.zip).  This included many series that were not available in the Nature, NOAA or Penn State supplementary information (but, at one time, had been in the now dead UVA archive that had been temporarily available in late 2003 and early 2004, but unavailable in the SI,)

In October 2021, Soderqvist had determined Mann that the AD1400 and AD1450 proxy lists were incorrect and contacted Mann pointing out the errors and required corrections to the SI:

For the AD 1400 and AD 1450 steps, the reconstruction is not a linear combination of the archived proxies. The correct proxy lists can be determined by adding available proxies until the reconstruction is in their linear span. It turns out that PCs 3 to 6 of the NOAMER network have been replaced with proxies that were not used in these time steps. For comparison, the follow-up paper “Long-term variability in the El Niño/Southern Oscillation and associated teleconnections” lists the first six PCs (table 1, entries 89-94).

There is also an undocumented AD 1650 step with its own set of proxies. It is just the AD 1600 set with some additional proxies.

Instead of issuing a Corrigendum or otherwise correcting the SI, Mann and associates buried the information deep in a Penn State archive (see link).  The covering text cited Soderqvist, together with Wahl and Ammann, as “two emulations” of the MBH98 reconstruction, ostentatiously failing to mention our original emulation of the MBH98 reconstruction (which exactly reconciled to the later Wahl-Ammann version: see link; link) or emulations by UC or Jean S, on which Soderqvist had relied.

Two more years passed.

Earlier this year, I corresponded with and collaborated with Soderqvist (@detgodehab on Twitter) on his remarkable discovery of a fatal flaw in the popular “signal-free” tree ring methodology used in PAGES2K and now widely popular (see X here).

A few days ago, I posted a thread on MBH98 residuals (link) in which I observed that several datasets connected with notorious Dirty Laundry email contained 1902-1980 excerpts from MBH98 AD1400 and AD1600 steps that had not been previously identified as such.  Soderqvist commented on the thread, pointing out (in passing) a quirky Mannian error in calculation of average temperatures that no one had noticed in the previous 25 years.

Impressed once again by his reverse engineering acumen, I posed (or thought that I was posing) the longstanding mystery of reverse engineering the actual list of MBH98 proxies used in the AD1400 step as something that might interest him.  I even suggested that the NOAMER PC3 might be involved somehow (on the basis that it was used in the AD1000 step and might have been used in AD1400 step.)

AS it turned out, Soderqvist had not only thought about the problem, but figured it out. And the PC3 was involved.

The information at his github site showed that four series listed in the SI but not actually used were two French tree ring series and two Moroccan tree ring series.  They were also listed in the AD1450 and AD1500 networks, but do not appear to have been actually used until the AD1600 network.

A few days ago, Soderqvist archived the results of the individual steps at his github (see link here). I checked his AD1400 results against the 1400-1449 excerpt in the splice version and the 1902-1980 excerpt in the Dirty Laundry data and the match was exact.  I’ve additionally collated his results are collected into an xlsx spreadsheet in a second archive here: https://climateaudit.files.wordpress.com/2023/11/recon_mbh-1.xlsx.

So, after all these years, we finally have the values for the individual MBH98 steps that Mann and Nature refused to provide so many years ago.

New Light on An Old Dispute

But there’s another reason why this particular error in listing proxies (claiming use of two North America PCs, rather than the six PCs actually used) intrigued me.

During the original controversy, Mann did not merely list use of two NOAMER PCs in obscure Supplementary Information: he vehemently and repeatedly asserted that he had used two North American PCs in the AD1400 because that was the “correct” number to use under “application of the standard selection rules”. It was a preoccupation at the opening of Realclimate in December 2014, when Mann was attempting block publication of our submission to GRL.

For example, the very first article (scroll through 2004 archives to page 9 link) in the entire Realclimate archive, dated November 22, 2004 – almost three weeks before Realclimate opened to the public on December 10, 2004 – is entitled PCA Details: PCA of the 70 North American ITRDB tree-ring proxy series used by Mann et al (1998). Mann stated that two North American PCs were used in the AD1400 network based on “application of the standard selection rules” applied to short-centered data:

Realclimate opened on December 10, 2004 (link) and, on opening, featured two attacks on us by Mann (link; link) entitled False Claims by McIntyre and McKitrick regarding the Mann et al (1998) reconstruction and Myth vs Fact Regarding the “Hockey Stick“.   Both were dated December 4, 2004.  Mann cited our Nature submission as the target of his animus.

In these earliest Realclimate articles, (link; link) Mann vehemently asserted (linking back to the PCA Details article) that they had used two PC series in the MBH98 AD1400 network by application of Preisendorfer’s Rule N to principal components calculated using “MBH98 centering” i.e. Mann’s incorrect short centering:

The MBH98 reconstruction is indeed almost completely insensitive to whether the centering convention of MBH98 (data centered over 1902-1980 calibration interval) or MM (data centered over the 1400-1971 interval) is used. Claims by MM to the contrary are based on their failure to apply standard ‘selection rules’ used to determine how many Principal Component (PC) series should be retained in the analysis. Application of the standard selection rule (Preisendorfer’s “Rule N’“) used by MBH98, selects 2 PC series using the MBH98 centering convention, but a larger number (5 PC series) using the MM centering convention.

In an early Climate Audit article (link), I tested every MBH98 tree ring and step using Preisendorfer’s Rule N and was unable to replicate the numbers of retained PCs reported in the SI using that rule.

Soderqvist’s discovery that MBH98 used six North American PCs not only refutes Mann’s claim that he used two North American PCs, but refutes his claim that he used Preisendorfer’s Rule N to select two PCs.  Soderqvist’s discovery raises a new question: how did Mann decide to retain six North American PCs in the AD1400: it obviously wasn’t Preisendorfer’s Rule N. So what was the procedure? Mann has never revealed it.

Subsequent to the original controversy, I’ve written many Climate Audit posts on properties of principal components calculations, including (some of what I regard as the most interesting) Climate Audit posts on Chaldni patterns arising from principal components applied to spatially autocorrelated tree ring series.  The takeaway is that, for a large-scale temperature reconstruction, one should not use any PCs below the PC1.  The reason is blindingly obvious once stated: the PC2 and lower PCs contain negative signs for approximately half the locations i.e. they flip the “proxies” upside down.  If the tree ring data are indeed temperature “proxies”,  they should be used in the correct orientation. Thus, no need for lower order PCs. In many important cases, the PC1 is similar to a simple average of the series.  Lower order PCs tend to be contrasts between regional groupings.  In the North American network, southeastern US cypress form a grouping that is identifiable in the PC5 (centered) and, needless to say, the stripbark bristlecones form another distinct grouping.

He then observed that, under MM05 (correct) centering, the “hockey stick” pattern appeared in the PC4.  For the subsequent inverse regression step of MBH98 methodology, it didn’t matter whether the hockey stick pattern appeared in the PC1; inclusion even as a PC4 was sufficient to impart a hockey stick shape to the resulting reconstruction:

Although not disclosed by MM04, precisely the same ‘hockey stick’ PC pattern appears using their convention, albeit lower down in the eigenvalue spectrum (PC#4) (Figure 1a). If the correct 5 PC indicators are used, rather than incorrectly truncating at 2 PCs (as MM04 have done), a reconstruction similar to MBH98 is obtained

Being a distinct regional pattern does not prove that the pattern is a temperature proxy.  “Significance” under Rule N is, according to Preisendorfer himself, merely a “attention getter, a ringing bell… a signal to look deeper, to test further”.  See our discussion of Preisendorfer here.

The null hypothesis of a dominant variance selection rule [such as Rule N] says that Z is generated by a random process of some specified form, for example a random process that generates equal eigenvalues of the associated scatter [covariance] matrix S…  One may only view the rejection of a null hypothesis as an attention getter, a ringing bell, that says: you may have a non-random process generating your data set Z. The rejection is a signal to look deeper, to test further.

Our response has always been that the relevant question was not whether the hockey stick pattern of the stripbark bristlecones was a distinctive pattern within the North American tree ring network, but whether this pattern was local and specialized, as opposed to an overall property; and, if local to stripbark bristlecones, whether the stripbark bristlecones were magic world thermometers. The 2006 NAS panel recommended that stripbark bristlecones be avoided in temperature reconstructions, but their recommendation was totally ignored.  They continued in use in Mann et al 2008, PAGES2K and many other canonical reconstructions, none of which are therefore independent of Mann et al 1998-99.

While most external attention on MBH98 controversy has focussed on principal component issues, when I reviewed the arc of Climate Audit posts in 2007-2008 prior to Climategate, they were much more focused on questions pertaining to properties of the inverse regression step subsequent to the principal components calculation and, in particular, to overfitting issues arising from inverse regression.  Our work on these issues got sidetracked by Climategate, but there is a great deal of interesting material that deserves to be followed up on.

MBH98 Confidence Intervals

Continued from here.

The Dirty Laundry residual datasets for AD1000, AD1400 and AD1600 were each calculated using Mann’s “sparse” instrumental dataset, but the resultant sigmas and RE(calibration) statistics don’t match reported values.   In contrast, the Dirty Laundry residual dataset for the AD1820 step, which was calculated by Tim Osborn of CRU because Mann “couldn’t find” his copy of the AD1820 residual data,  used a different MBH98 target instrumental dataset – the “dense” instrumental series.

Question: is it possible that Mann had two versions of the residual data: sparse and dense? And that he chose the dense version for MBH98 statistics (sigma, RE_calibration) because it yielded “better” statistics, but inadvertently sent the sparse version (with worse values) to Osborn?

This appears to be exactly what happened. If one uses the Dirty Laundry values for the reconstruction in 1902-1980 versus the MBH98 dense temperature series, one gets an exact replication of reported MBH98 calibration RE and sigma (standard error of residuals) for the AD1400 and AD1600 step and reported MBH99 calibration RE for the AD1000 step.

Conclusion: We KNOW that MBH98 calculated residual series using the sparse target because they were sent to Osborn in the Dirty Laundry email and shown in the MBH99 submission Figure 1a.  We KNOW that MBH98 calculated residual series using the dense target because of the reported RE_calibration and sigma values in MBH98.  The corollary is that MBH98 calculated two sets of residual series and then selected the “better” values for display without disclosing the worse values. Or the selection operation.

MBH99 confidence intervals are related to MBH98 confidence intervals, but different. They were a longstanding mystery during the heyday of Climate Audit blog.  In next post, I’ll review MBH99 confidence intervals. We’re a bit closer to a solution and maybe a reader will be able to figure out the balance.

Over and above, this particular issue is another even more fundamental issue: the use of calibration period residuals to estimate confidence intervals when there is a massive failure of verification period r^2 values.  Prior to Climategate, I had written several posts and comments in which I had raised the issue and problem of massive overfitting in the calibration period through a little discussed MBH98/99 step involving a form of inverse regression. (Closer to PLS regression than to OLS regression – some intuitions of OLS practitioners have to be set aside.)   There are some very interesting issues and problems arising from this observation. And even some points of potential mathematical interest. I’ll try to elaborate on this in a future post.

Postscript

There is a substantial and surprisingly large difference between the two MBH98 target instrumental series (see diagram below).  The sparse series, according to MBH98, is the “subset of the gridded data (M′ = 219 grid-points) for which independent values are available from 1854 to 1901″; the dense series is calculated for 1902-1980 from 1082 gridcells.  In the 19802-1980 (MBH calibration) period, there is considerably more variability in the sparse series..

 

 

 

 

“Dirty Laundry” Residuals

Continued from previous post link.

The data associated with the Climategate “dirty laundry” email had other interesting information on Mann’s calculation of confidence intervals and the related calculation of RE statistic.  This post draws heavily on offline comments by Jean S and UC, both long before and after Climategate.

The left panel below is Tim Osborn’s summer 2003 plot of the AD1000 residuals in one of the “dirty laundry” datasets sent to him by Mann. It matches the AD1000 data in the right panel – Figure 1a in the submission version of Mann et al 1999.  UC had noticed this figure in submission version in 2006 or so.

While the plot of calibration residuals was not carried forward into the published version of Mann et al 1999, an identical figure showing the spectrum of calibration residuals appears in both versions (see below). This almost certainly precludes an unreported switch of the calculation of calibration residuals between submission version and publication (though, with Mann et al, nothing can be excluded.)

Here’s the rub: the standard errors (RE_calibration) values reported in MBH98 and MBH99 are lower (much higher) than values calculated using the Dirty Laundry data.

The RE_NH_cal(ibration) values for MBH98 were reported in its original statistical SI (link) and, for MBH99, in its running text. The MBH98 sigmas (standard error of residuals) for each step can be extracted from the archived stepwise reconstruction mannnhem.dat (NOAA link).  The standard error of residuals in the Climategate “dirty laundry” datasets (AD1000, AD1400, AD1600) can be trivially calculated. Osborn did so in his August 2003 Climategate I document entitled Mann uncertainty.docx.  I verified the calculation – values are shown below.  The calibration RE (RE_cal) is trivially calculated as 1- (se_residuals/sd_obs)^2. (The standard deviation of the target observation data used in the above Dirty Laundry datasets is 0.2511.)

Conclusions:

The Dirty Laundry residual datasets do NOT match the reported RE calibration or sigmas (standard error of residuals) reported for MBH98 (AD1400, AD1600) or the RE calibration reported for MBH99, even though the Dirty Laundry residuals for AD1000 match Figure 1 in the MBH99 submission.  The calculation of MBH confidence intervals was a standing puzzle in pre-Climategate discussion – see review here – and never fully resolved. While the reported numbers do not match the data in the Dirty Laundry residual datasets, the glimpses of the underlying reconstructions in the Dirty Laundry datasets provides data that can be used to finally resolve the calculation of MBH98 confidence intervals.  More on this in next post. See here

Footnotes:

(1)  Here is a screengrab of relevant statistics in the original SI for MBH98 (link):

(2) Here is 2005 figure showing MBH98 confidence intervals for each step as extracted from the reconstruction archive:

Figure S1. Standard error (“sigma”) of MBH98 Reconstruction Steps. Calculated from confidence intervals for MBH98 reconstruction at NOAA archive here.

Mann’s “Dirty Laundry”

As the date approached for the Mann-Steyn/Simberg libel trial, I’ve been reviewing my files on MBH98 and MBH99. It’s about 15 years since I last looked at these issues. 

While revisiting these issues, I re-examined some data associated with the notorious “dirty laundry” Climategate email (link)  – excerpt shown at right – that turns out to provide a glimpse of the long obfuscated results for AD1000, AD1400 and AD1600 steps over the 1902-1980 period.  I don’t recall this being noticed at the time.  Even by Jean S or UC.  The identification proved interesting. 

Attached are the calibration residual series for experiments based on available networks back to: AD 1000, AD 1400, AD 1600. I can’t find the one for the network back to 1820! But basically, you’ll see that the residuals are pretty red for the first 2 cases, and then not significantly red for the 3rd case–its even a bit better for the AD 1700 and 1820 cases, but I can’t seem to dig them up. In any case, the incremental changes are modest after 1600–its pretty clear that key predictors drop out before AD 1600, hence the redness of the residuals, and the notably larger uncertainties farther back… You only want to look at the first column (year) and second column (residual) of the files. I can’t even remember what the other columns are! Let me know if that helps.

The data referred to in this email were located in the Climategate I directory mbh98-osborn.zip as nh-ad1400-resid.dat etc and were discussed in a draft Osborn memorandum Mann uncertainty.doc.  The columns are unlabelled.  The second column is residuals.  The AD1400 example is shown below:

An aside about Mann’s favorite “RE” statistic.  While Mann (and Wahl-Ammann) hyperventilated about the supposedly unique validation imparted by an RE statistic, the RE statistic is extremely sensitive to choice of calibration and verification period – an issue that was never addressed by Mann or Wahl-Ammann. So, in the above example, if the calibration period were set at 1920-1960 and verification period at 1961-1980, the RE fails miserably.  If the RE statistic is so extremely sensitive to the choice of calibration and verification periods, it entails that the underlying reconstruction does  NOT  possess the claimed “robustness” or “skill”.  

Continued here.